LX-DSemVectors: Distributional Semantics Models for Portuguese

نویسندگان

  • João António Rodrigues
  • António Branco
  • Steven Neale
  • João Ricardo Silva
چکیده

In this article we describe the creation and distribution of the first publicly available word embeddings for Portuguese. Our embeddings are evaluated on their own and also compared with the original English models on a well-known analogy task. We gathered a large Portuguese corpus of 1.7 billion tokens, developed the first distributional semantic analogies test set for Portuguese, and proceeded with the first parametrization and evaluation of Portuguese word embeddings models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Linear Models for Compositional Distributional Semantics

In distributional semantics studies, there is a growing attention in compositionally determining the distributional meaning of word sequences. Yet, compositional distributional models depend on a large set of parameters that have not been explored. In this paper we propose a novel approach to estimate parameters for a class of compositional distributional models: the additive models. Our approa...

متن کامل

A relatedness benchmark to test the role of determiners in compositional distributional semantics

Distributional models of semantics capture word meaning very effectively, and they have been recently extended to account for compositionally-obtained representations of phrases made of content words. We explore whether compositional distributional semantic models can also handle a construction in which grammatical terms play a crucial role, namely determiner phrases (DPs). We introduce a new p...

متن کامل

Towards Syntax-aware Compositional Distributional Semantic Models

Compositional Distributional Semantics Models (CDSMs) are traditionally seen as an entire different world with respect to Tree Kernels (TKs). In this paper, we show that under a suitable regime these two approaches can be regarded as the same and, thus, structural information and distributional semantics can successfully cooperate in CSDMs for NLP tasks. Leveraging on distributed trees, we pres...

متن کامل

Category-theoretic quantitative compositional distributional models of natural language semantics

This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produc...

متن کامل

Mac-Morpho Revisited: Towards Robust Part-of-Speech Tagging

We present a revision of Mac-Morpho, the biggest corpus of Portuguese text containing manually annotated POS tags. Many errors were corrected, yielding a much more reliable resource. We also trained a neural network based classifier for the POS tagging task, following an architecture that achieves state-of-the-art results in English. Our tagger maps each word to a real valued vector and uses it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016